Locality Sensitive Clustering in High Dimensional Space

نویسندگان

  • Haolin Gao
  • Bicheng Li
  • Gang Chen
  • Yongwei Zhao
چکیده

In high dimension space, many conventional clustering algorithms do not work well in effectiveness and efficiency, especially for image data set. For example, k-means is widely used in image clustering especially visual clustering. But its drawback such as long clustering time and high memory cost seriously deteriorates feasibility in incremental large image set. To improve the feasibility, we proposed a Locality Sensitive Clustering method. Firstly, multiple hashing functions are generated. Secondly, data points are projected to get bucket indices. Thirdly, proper quantification interval is selected to merge the bucket indices, and the cluster labels are assigned for each point. Experimental results show that on synthetic data set this method performs almost as well as k-means, and on image data set it performs slightly worse than k-means algorithm about accuracy. But its advantage is in low memory cost, fast running speed and incremental clustering. So Locality Sensitive Clustering can be used to clustering data, especially in high dimensional space.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

HSEARCH: fast and accurate protein sequence motif search and clustering

Protein motifs are conserved fragments occurred frequently in protein sequences. They have significant functions, such as active site of an enzyme. Search and clustering protein sequence motifs are computational intensive. Most existing methods are not fast enough to analyze large data sets for motif finding or achieve low accuracy for motif clustering. We present a new protein sequence motif f...

متن کامل

Supervised Feature Extraction of Face Images for Improvement of Recognition Accuracy

Dimensionality reduction methods transform or select a low dimensional feature space to efficiently represent the original high dimensional feature space of data. Feature reduction techniques are an important step in many pattern recognition problems in different fields especially in analyzing of high dimensional data. Hyperspectral images are acquired by remote sensors and human face images ar...

متن کامل

High-Dimensional Similarity Search Using Data-Sensitive Space Partitioning

Nearest neighbor search has a wide variety of applications. Unfortunately, the majority of search methods do not scale well with dimensionality. Recent efforts have been focused on finding better approximate solutions that improve the locality of data using dimensionality reduction. However, it is possible to preserve the locality of data and find exact nearest neighbors in high dimensions with...

متن کامل

Fast and Robust Subspace Clustering Using Random Projections

Over the past several decades, subspace clustering has been receiving increasing interest and continuous progress. However, due to the lack of scalability and/or robustness, existing methods still have difficulty in dealing with the data that possesses simultaneously three characteristics: high-dimensional, massive and grossly corrupted. To tackle the scalability and robustness issues simultane...

متن کامل

Locality sensitive hashing: A comparison of hash function types and querying mechanisms

It is well known that high-dimensional nearest-neighbor retrieval is very expensive. Dramatic performance gains are obtained using approximate search schemes, such as the popular Locality-Sensitive Hashing (LSH). Several extensions have been proposed to address the limitations of this algorithm, in particular, by choosing more appropriate hash functions to better partition the vector space. All...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014